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Abstract 


The spread of invasive species can have far reaching environmental and 
ecological consequences. Understanding invasion spread patterns and the 
underlying process driving invasions are key to predicting and managing 
invasions. We develop a set of statistical methods to characterize local 
spread properties and demonstrate their application using historical data on 
the spread of the gypsy moth, Lymantria dispar, and hemlock wolly adelgid, 
Adelges tsugae. Our method uses a Gaussian process fit to the surface of 
waiting times to invasion in order to characterize the vector field of spread. 
Using this method we estimate with statistical uncertainties the speed and 
direction of spread at each location. Simulations from a stratified diffusion 
model verify the accuracy of our method. We show how we may link local 
rates of spread to environmental covariates for our two case studies. 


Introduction 


When a non-native species successfully establishes in an exotic environment it en¬ 
ters the spread phase of biological invasions during which it expands its range into 
suitable habitat (Lockwood et ah, 2013). Ecological theory has shown that the 


speed of invasion spread is a joint function of the dispersal rate and the popula¬ 


tion growth rate of the invading species (Skellam, 1951 Okubo and Okubo, 1980); 


any habitat characteristic that influences population growth or dispersal can thus 
influence the rate of spread. Rates of spread may vary considerably among species 
and for a given species, spread rates may vary across heterogeneous landscapes 


(Shigesada et ah, 1987 Tobin et ah, 2007b). Understanding the mechanisms cans 


ing heterogeneity in the rate of invasion spread is key to predicting future rates of 
spread and to identifying important locations for management. 

In this work we develop new methods for estimating local speed and dominant 
direction of spread along the invasion front. Our approach can be applied to 
identify statistically signihcant environmental and geographic determinants of local 
invasion rates. 

In addition to environmentally-driven heterogeneity in rates of spread, there 
is considerable variation among species in the extent to which invasion spread is 
continuous. Spread of some species occurs via continuous expansion of the range 
into contiguous areas. For example, the North American muskrat. Ondatra zibeth- 
ica, invaded central Europe from 1905-1927 via gradual expansion of its range in 


concentric circles (Skellam, 1951). The spread of other species is highly discon 
tinuous, characterized by a pattern referred to as stratified diffusion (Shigesada 


et ah, 1995); following initial establishment, expansion may happen with long- 


range jumps into isolated uninvaded areas, founding new colonies which expand 
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and eventually coalesce to form a contiguously invaded zone. This pattern is ob¬ 
served in many species of invading organisms, such as invasion of North America 


by the Argentine ant, Linepithema humile 

Suarez et ah, 

2(101 

) and the gypsy 

moth, Lymantria dispar ( 

Sharov et al., 

2002 




Quantifying the spread of non-native species and relating invasion speed to 
habitat heterogeneity is important for predicting and managing biological inva¬ 
sions. Several methods have been developed for measuring spread based upon 
htting range size to time since establishment or estimating spread by directly 


quantifying displacement of range boundaries over time (Sharov et al., 1997 Tobin 


et al. 

2007a 

Gilbert and Liebhold 

2010 


These methods are generally well-suited 
for quantifying average spread range and temporal variation therein, but they are 
limited in their ability to quantify local spread rates and their relation to local 
habitat characteristics. Also, these methods are generally designed to quantify 
spread as a continuous process; identihcation of long-range jumps in stratihed dis¬ 
persal is usually done visually in a non-automated fashion. These gaps in existing 
methodology provides our motivation for developing new and statistically rigorous 
methods for estimating local speed and direction of spread. We take advantage 
of recent statistical theory on the estimation of spatial gradients. We test our 
methods on simulated data generated from a stratihed diffusion model and apply 
them to two detailed case studies of biological invasions, the historical spread of 
the gypsy moth and the hemlock wolly adelgid, Adelges tsugae, in North America. 


Data 


Gypsy moth 

Native to Europe and Asia, the gypsy moth was accidentally introduced from 


France to Massachusetts in the late 1860’s (Liebhold et ah, 1989), it has since 


spread throughout much of the northeastern US. The gypsy moth is now estab¬ 
lished in a large area composed of the north Atlantic states and bordering Cana¬ 
dian provinces, as well as a second focus resulting from a long-range jump event 
to Michigan around 1980 (Liebhold et al.| 1992 Johnson et al., 2006 Tobin et al. 

mjbj ). 

The invasion of the gypsy moth across North America has been rather slow 


compared to the rate of spread of many other alien species (Liebhold and Tobin 


et al. 


2008|). Mean spread was estimated at 21 km per year from 1960 to 1990 (Liebhold 
|1992). The relatively slow rate of spread can be attributed, in part, to 


the fact that females of North America populations are flightless. Gypsy moth 
populations spread by short-range windborne dispersal of 1®* instar larvae through 
a process known as ‘ballooning’ (Mason and McManus, 1981). Egg masses are also 
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accidentally transported on wood or hnman-made objects, forming new colonies 


ahead of the invasion front and resulting in a pattern of stratihed diffusion (Sharov 


et ah, 2002). 


The full invasion history of the gypsy moth in the US is reflected in the year of 
government designation of gypsy moth quarantine by county. County-level quar¬ 
antine records for gypsy moth are maintained by the United States Department 
of Agriculture (U.S. Code of Federal Regulations, Title 7, Chapter III, Section 
301.45). Historically, an entire county was usually designated part of the quaran¬ 
tined area when established gypsy moth populations were hrst detected anywhere 
within the county. These records are updated annually and exist from 1934 to the 
present. From 1900 to 1934, the year when counties were hrst infested has been 
described in various other published sources (e.g., Burgess, 1913, 1915; Liebhold 


et al. 


1992|). As additional covariates, we used county-level data (|Liebhold et al. 

rised of oaks, v 
of each county. 


1997) on the percent of the forest basal area comprised of oaks, which is a favored 
food plant of the gypsy moth, and the size (km^) 


Hemlock wolly adelgid 

Hemlock woolly adelgid (HWA) is an insect species responsible for mass defoliation 


of its host trees, eastern hemlock and Carolina hemlock (Orwig et al., 2002 Morin 


et al. 

2009 

). Native to East Asia, it was first discovered in the USA in Virginia in 

the 1950s ( 

Ward et al. 

2004 

). HWA life stages can be transported by wind, wildlife. 


especially birds, and humans. Since its discovery, it has gradually expanded its 


range into much of the northeastern USA (Evans and Gregoire, 2007 Morin et al. 


2009). By 1969 it was found in southern Pennsylvania and it invaded southern 


New England by 1985, spreading at an estimated speed of 20-30 km/year (Morin 


et al., 2009). 


As with the gypsy moth, historical spread of the HWA was recorded at the 
county level. Records from the US Forest Service Forest Health Protection are 
available for 1951, 1971, 1981, 1996, and from 2001 to 2011. We use the basal area 


of hemlock (Morin et ah, 2004) and plant hardiness zone (Cathey, 1990) for each 


county as additional covariates for our analysis. 


Methods 


Previous estimates of spread for gypsy moth data have estimated rates averaged 


over space. Liebhold et al. (1992) estimated spread rates for hve geographic regions 


by the slope of a least-squares regression of time on distance to a reference point 
in each region. Spread rates have also been estimated by measuring the average 
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displacement of range boundaries over time (Sharov et ah, 1997 Tobin et al 


2007a). 


Previous research on estimating spatial gradients from georeferenced biological 
data has focused on detecting zones or boundaries of rapid change across space 
using geostatistical wombling (Womble, 1951). Wombling methods involve esti¬ 


mating local vector gradients by fitting bilinear functions over a lattice of points. 


This method has been applied to genetic (Barbujani et ah, 1989) as well as eco¬ 


logical (Fortin, [l994 ) data. More recent wombling methods for areal data feature 
Bayesian hierarchical spatial models in order to identify signihcant boundaries af¬ 


ter accounting for spatial dependence via Markov random helds (Banerjee et al 


2004 Fitzpatrick et ah, 2010 Lu et ah, 2007), with applications to ecology and 


epidemiology. 

The use of spatial gradients to estimate biological spread is motivated by the 
fact that if the surface is the waiting time to first appearance, then the reciprocal 
of the gradient length is a measure of the invasion speed: Fast spread leads to shal¬ 
low waiting time surfaces, while slow spread results in steep surfaces. Previously 


Johnson et al. (2004) estimated spread gradients using thin plate spline applied 


to waiting times (as measured by wavelet phase angles) to study outbreak dy¬ 
namics of the larch budmoth. Farnsworth and Ward ( 2009[ ) used a similar spline 
surface approach to study spread of avian influenza. The thin plate spline ap¬ 
proach yielded gradients which reflect the magnitude and direction of the spread, 
a simple general-purpose approach for visualization, but has not associated errors 
with local spread estimates which prevents rigorous inference regarding whether, 
for example, any observed spatial variation is statistically significant. 


Gaussian process gradient models 


Banerjee et al. (2003) developed theory for estimating gradients of an arbitrary 


surface by specifying a Gaussian process model directly for the spatial gradients. 
We take advantage of this because when the gradients are defined on the waiting 
times to hrst appearance, the reciprocal of the gradient will represent the rate of 
spread. 

We develop methods to rigorously test important features of the invasion, fol¬ 


lowing existing theory (Banerjee et ah, 2003 Banerjee and Gelfand, 2006). Below 


we further describe pattern recognition methods for detecting sites of long-range 
dispersals, radial expansion and directional patterns in historical spread. We also 


provide, in an electronic supplement, computer code for an R (Ihaka and Gentle 


man 


1996; R Gore Team, |20l3 ) software package that automates inference about 


spread. 

We assume we have observations of the year of hrst appearance Y = {Y (si),..., R(s„)} 
at locations {si,..., s„}, Sj G For our examples, data are county-level quaran- 
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tine records and the spatial locations {si,s„} are taken to be the centroids of 
counties for the gypsy moth where n = 571 counties (Figure]^) and for the HWA 
where n = 340 counties (Figure [^). Coordinates are projected using the Albers 
equal area conic projection with standard parallels 29 30' and 45 30'. For county 
i, Y (sj) is the year the county was added to the quarantine. We assume Y (s) can 
be modelled using an isotropic Gaussian process with mean /i(s) and covariance 
K(-). 


Banerjee et ah (|2003|) defines the hnite difference directional derivative process 

hu) 


at location s for scale h in direction u as 

F(s 


Y^,h{s) = 


Y{s) 


h 


where w is a unit vector. The directional derivative process in direction u is then 
defined as 

DuY{s) = hmYu,h{s) 
h^O 

assuming the mean square limit exists. Of interest is the gradient of Y (s) at s, 
the vector of directional derivatives 

VYis) = iD,,Yis),D,,Yis)) 

in orthonormal basis directions = (1,0) and ei = (0,1). [Banerjee et al.| (2003) 
shows there exists a joint trivariate Gaussian process for ^(s) and Vy(s), and 
therefore Y and VW = {VF(si),..., VF"(s„)} have a joint multivariate normal 
distribution which takes the form 


Y 

VY 


W 


3n 




K{D) -VK{D) 


where D is the n x n matrix of pairwise Euclidean distances of s, and K{D) 
represents the n x n matrix of K{-) applied element-wise to H. is the length 
2n vector 

.I-.) “ 


VK{D) is the n x 2n matrix 


f(-) 


and Hk{D) is the 2n x 2n matrix 

/ d^K 

dx"^ 
d^K 


\ dxdy 


(D) 

(D) 






dy'^ 
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The conditional distribntion of the gradient is therefore given by 

VY|Y, 0 ~ N^n {Vfi - VK{Df[K{D)]-\Y - fi), -Hk{D) - VK{Df [K{D)]-^VK(D)) . 


where 0 is a vector of mean and covariance parameters for the Ganssian process. 
Letting 5 = (sq — si ,sq — s„), the estimated gradient at a point sq is given by 

VY(so)|Y, 0 ~ Y2(V/i(so) - VK{5f[K{D)]-\Y - ^), 

-Hk{ 0) - VK{6f[K{D)]-^VK{6)). 


Note that for the gradient process to be well-defined, the original process mnst 
be mean sqnare differentiable and all second order partial derivatives of K mnst 
exist. This is the case, for instance, when K is chosen to belong to the Matern 
family with smoothness parameter v > 1 (Stein, 1999). 

For onr applications, we assnme the original process Y (s) = p(s) -|-w(s) -|-e(s), 
with mean function /i(s) = (3q + + P 2 Syi correlated spatial error w{s) ~ 

GP(0,iF(-)) with Matern covariance smoothness v = 3/2, which takes the explicit 
form K{r) = cr^(l -|- (/r)exp{—(/r}, and uncorrelated error e(s) ~ Y(0,r^), where 
is a nugget effect capturing both mesurement error and microscale variability. 

We infer the mean and covariance parameters 0 = (/3o,/di,/52, 0, t^) based 

on a Bayesian approach using a Markov chain Monte Carlo (MCMC) algorithm. 
Flat prior distributions are assumed for mean parameters and inverse gamma priors 
are assumed for the partial sill (a^) and nugget (r^) with shape parameter 2 and 
scale parameter set to an approximate value from the empirical semivariogram. 
For the range (0) a uniform prior is chosen with a support that allows the process 
to vary from low to high dependency. Simulation from the predictive distribution 
of the gradient can then be done by composition; given each posterior sample of 
0, a sample for VY(so) can be drawn from ([^. 

At a given location sq, posterior samples are the gradient at sq in the x and y 
directions. When the data are times of first appearance, steeper gradients corre¬ 
spond to slower speeds. Therefore posterior estimates of the gradient (including 
Bayesian credible intervals) allow us to make inferences on the local speed and 
dominant direction of spread. Therefore the speed of spread is the inverse of the 
magnitude of the posterior gradient ||VY(so)||, and the dominant direction of the 
spread is in the direction of the gradient VY(so)/||VY(so)||- 


Detecting sources and long-range jumps 

It is useful to have automated methods to identify candidate locations for foci of 
long-range jumps ahead of the advancing front as different from locations with 
contiguous “wave-like” diffusive spread. However, consistently identifying these 
foci is a challenge. We have identified two candidate solutions. The hrst borrows 
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methods from circular statistics, and the second is a more direct test of the average 
gradient orthogonal to a curve around the point. 

The Rayleigh test (see e.g. Jammalamadaka and Sengupta, |2001 ) is a statistical 
test of whether a circular distribution is random or non-random. When applied 
to the vectors of spread near a point, a non-random distribution implies a unified 
directional spread through that point. We take directions of spread in a neighbor¬ 
hood around each centroid and test if these directions are drawn from a uniform 
circular distribution. Say we have n estimated directional vectors of spread yi 
in a neighborhood around a point. Let r be the length of the mean vector from 
this sample, r = The test statistic is given hy R = 2nr‘^ for the test of 

Hq : Directional vectors are distributed randomly 
Ha : Directional vectors are distributed non-randomly. 

Under the null hypothesis R will be distributed with 2 degrees of freedom. If 
the test fails to reject the null this may contribute evidence that the directional 
distribution is uniform as would be the case for radial spread from the point. While 
this may occur because the location is the point source of a long-range jump, the 
Rayleigh test will also flag areas with vectors that are converging to the point (akin 
to an ecological sink) or are truly random. Therefore while it is useful for flagging 
potential sites of long-range jumps, it is not a perfect method because we need to 
visually distinguish between the three different scenarios that all lead to failure to 
reject the null. 

We also test directly the distribution of the gradient around a point, which will 
allow us to check if there is signihcant radial expansion around that point. 

Define a curve Ct* = {s(f) : t G [0,f*]}, where s{t) = (si(f), S 2 (t)) G TZ^ and 
s\t) is the componentwise derivative. Let ri{s{t)) be the unit vector normal to the 
curve at the point s{t). The total gradient normal to Ct* is 


T(f)= {VF(s(()),r,(s(()))<i!>, 


where v is the arc-length of the curve, v{t*) = ||s'(f)||(if, and so 


r(u)= / (vY(smrj(sm\\s'mdt, 

Jo 


In Banerjee and Gelfand (2006) it is shown that the distribution for the total 


gradient over a curve is a Gaussian process on [0,T], T{t*) ~ GP(/ir(U),iLr(-,-)) 
with 

Mt*) = f {Ks{t)),v{s{t)))\\s'{t)\\dt 
Jo 


KT{t\Hl) = / / r]^{s{ti))HK[s{t 2 ) - s{ti)]r]{s{t 2 ))\\s'{ti)\\\\s\t 2 )\\dtidt 2 

Jo Jo 








where fi{-) is the mean of the original process Y{s) and Hk{-, ■) is the hessian of 
the covariance of y(s). 

The conditional distribution of interest is 
r(«-)|Y, e~N(^r- 7?(i-)[A-(£>)]-‘(Y - ^), Kr(t\f) - 7?'(f)|A'(B)]-V(r)) 
where for j = 1 ,n 

= cov{T{t*),Y{sj)) = f {K [s{t) - s{j)] ,r]{s{t)))\\s'{t)\\dt. 

Jo 

The terms 7r(t*) and Kr{t*,t*) are not available analytically, and must be com¬ 
puted using numerical integration. The average gradient normal to the curve Ct* 
is simply the total gradient T(t*) divided by the arc-length. 

Using this method centered on each location we can test the gradient normal 
to four sides of a box with sides of length r. As an heuristic we say if the spread 
is signihcantly out of at least two sides of the box, and does not go signihcantly 
into any side of the box, we will flag the location as a potential site of a long-range 
jump. We do this for a grid over the region of interest. In our hgures, red lines 
indicate sides where there is a signihcant outward spread. 


Driving factors of spread 


We can gain insight into the mechanisms of spread by relating the geographic 
variation in spread to characteristics of habitat. To account for spatial dependence 
we £t a Bayesian spatial regression model to log-speeds of spread speed using the 


R package spBayes (Finley et ah 


2007). We apply the log transformation to the 

If the mean speed at 


response since the speeds have right-skewed distributions, 
location Sq is given by U(so), then we assume 

logU(so) = X'^{so)(3 + w{so) e(so) 


where W(s) is a vector of the spatially varying environmental and geographical 
covariates of interest. w{s) ~ GP{0, G{-)), G(-) has Matern covariance smoothness 
with smoothness ly, range </> and partial sill and e(s) ~ 77(0, r^). Priors are 
selected as before and joint estimation is done via MCMC for {/3, cr^, 0, r^, z/}. 


Results 

Gypsy moth 

Significant speeds and directions of historical spread of the gypsy moth are plotted 
at the locations of each county in the quarantine in Figure [^. The mean speed 
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of spread over all counties is 20.8 km/year, with a median of 13.4 km/year. Dis¬ 
tributions for the magnitude of spread at each location tend to be right skewed, 
where the average length of 95% credible intervals is 29.7 km/year. 

In Figure we test for sites of long-range dispersals using the two methods 
discussed above. For each method in practice we take circular neighborhoods of 1°. 
Points identified by the Rayleigh test are marked in black in Figure [^, along with 
boxes where the gradient test is significant. Both methods identify three potential 
sites around the northeastern coast, Michigan and central-western Pennsylvania. 
Prior analysis conhrms two of these sites, as the population was first introduced 
in Massachusetts in the 1860s and a discrete population was later established in 
Michigan (Liebhold et ah, 1992). A close look at Figure also highlights the 


jump to Centre County, PA in the mid 1970s. 

We relate speed of spread to latitude and longitude, quarantine date, county 
size, and finally the percent basal area of trees preferred as hosts of the gypsy 
moth. Estimated parameters of the spatial regression model are given in Table 
01- We verify that on average the gypsy moth spread faster as it moved west. We 
also found that basal area of susceptible host trees is significantly associated with 
speed of spread, consistent with the concept that local growth rates will be larger 
in the face of more favorable habitat, and should consequently enhance invasion 
spread rates. 


Hemlock woolly adelgid 

Significant speeds and directions of spread for the HWA are plotted at each county 
in Figure [^. We find a mean speed of spread of 20.5 km/year across counties, 
with a median speed of 13.5 km/year. The 95% credible intervals for spread at 
each location have an average length of 31.8 km/year. 

Sites of long-range dispersal events are identified in Figure [3 ]d. We detect ar¬ 
eas of apparent long-range dispersals in Richmond and southern PA, suggesting 


a pattern of stratihed diffusion. Morin et ah (2009) also found that expansion is 
signihcantly influenced by availability of host trees. Low winter temperatures can 
cause extensive mortality in HWA populations and limit expansion to the north 


(Trotter and Shields, 2009). Therefore we relate speeds of spread to environmen¬ 


tal features including the presence or absence of hemlock trees, and the average 
plant hardiness zone for each county, an index based on the mean annual min¬ 
imum winter temperature (Cathey, 1990). Estimates from the regression model 


are given in Table 03. We observed evidence that historically expansion is faster 
to the west and north. We also find as in Morin et ah (2009) that spread is 


signihcantly associated with the abundance of host trees. We also tested the in¬ 
teraction between plant hardiness zone and latitude and found at a given latitude, 
HWA spread signihcantly slower through areas with lower plant hardiness zones 
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[/3 = 3.4(0.4,6.3)]. 


Simulation 


We tested the ability of our method to recover the effects that spatially varying 
habitats have on the speed of spread. To accomplish this, data are simulated from 
a stratihed diffusion model following [Shigesada et ah (1995). Stratihed diffusion is 
a combination of neighborhood diffusion and long distance dispersal. As the size 
of the original colony expands, new colonies are more likely to be created by long 
distance migrants. 

The simulation starts with a single colony, centered at the initial point of 
invasion. The occupied area grows out in a circle, the radius r growing at constant 
rate c. This colony can then form offspring colonies from long-distance migrants 
in a random direction at a distance L from the invasion front. New colonies form 
at a rate A(r) that is a function of the colony radius. These offspring colonies grow 
at speed c and form offspring colonies of their own. 

We begin with an initial introduction in Massachusetts in the year 1900. Colony 
range expansion c varies by longitude to simulate a slow period of initial expansion; 
c = 10 km/year east of —78° and c = 20 km/year west of —78°. New colonies form 
at rate A(r) = O.lr a distance L = 10 km from the invasion front. Additionally, to 
mimic the observed Gypsy Moth data an artihcial long-range jump is introduced 
in Michigan in 1950. The simulation is run for 107 years with an annual timestep. 

The time until the invasion front reaches each county is recorded as the sim¬ 
ulated quarantine data (Figure]^). In Figure]^ we observe that our method of 
inference has successfully identihed the two hxed colony introductions as regions 
of long-range jumps. We recover mean spread rates in the west of 10.7 km/year 
and in the east of 21.4 km/year, close to the actual values used in the simulation. 


Discussion 


To study the establishment and spread of biological invasions we present a new 
method to estimate local rates and direction of spread, and identify key spatial 
features including sources, sites of rapid spread and long-range jumps. We visualize 
and make inferences on historical patterns of spread of the gypsy moth and hemlock 
woolly adelgid as well as validate the methodology on simulated data. Posterior 
inference in a Bayesian setting allows us to test the signihcance of spread patterns 
and spatial features of these invasions in a rigorous way. 

Taking our local estimates of gypsy moth spread and averaging them across 


time yields results in line with previous estimates in the literature (Liebhold et ah 


1992). We hnd an average speed of 11.4 km/year across counties quarantined from 
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1900 to 1915, followed by a slow spread (5.0 km/year) across counties from 1916 
to 1965 and then a period of very rapid expansion (25.8 km/year) from 1966 to 
2000. These changes may also be related to the differences in Allee effects among 
different regions along the invasion front as evidenced in Tobin et ah (2009). From 
2000 to present, coincident with the “Slow the Spread” program of control (Sharov 


et al., 2002) we observed an average speed of 14.6 km/year. 


Our estimates for the spread of HWA when spatially averaged are also in line 
with estimates from the literature (e.g. Ward et al.| (2004)). There is evidence 


that HWA range expansion is limited both by a lack of host trees and in the north 
by extreme winter temperatures. We also observe the spread of HWA to appear 
less diffusive than the gypsy moth. One explanation for this is the important 
role of wildlife, especially migratory birds, as a mechanism for HWA dispersal 
(McClure, 1990) in addition to windborne dispersal and human transport, which 
also accounts for anisotropy in spread observed in Morin et al. (2009). 

Our abilities to identify patterns of spread are constrained by the spatial and 
temporal resolution of our data. County-level quarantine data are typically coarser 


than, for example, gypsy moth pheromone trap count data, though Tobin et al. 


(2007a) showed the two sources of gypsy moth data provided similar estimates. 
Additionally, the original Gaussian process must be sufficiently smooth for a gra¬ 
dient process to exist (we take the Matern model with smoothness parameter 
V = 3/2), with the consequence that some information is lost at small scales. We 
rely for the most part on annual records, but before 2001 the range of HWA was 
recorded at less frequent intervals. This is a potential source of bias in our early 
estimates of HWA spread. 

For large spatial datasets, fitting a Gaussian process is a computational bur¬ 
den. Once the original Gaussian process is fit, however, we can draw samples by 
composition from the gradient process quickly. When the number of spatial loca¬ 
tions is in the thousands we must likely have to rely on approximations such as 


the predictive process model of Banerjee et al. (2008). 


Generally whenever the data are point-referenced waiting times, the speeds of 
spread can be estimated from the inferred gradient process. Therefore the methods 
presented here are generally applicable in ecology and epidemiology to spread 
of invasive species and infectious disease. These methods are also potentially 
applicable to non-invasion problems such as population waves such as the spread 
of an advantageous allele (Fisher, 1937), or recurrent outbreak waves in species 
such as the larch bud moth ( pohnson et al. 2004). An R package that automates 
the inference is available as an electronic supplement. 
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Table 1: Results of a spatial regression of speeds of spread (km/year) for the 
gypsy moth (a) and hemlock wolly adelgid (b) including posterior means and 95% 
credible intervals obtained using the highest posterior density interval algorithm 


(Chen et ah, 2000) 


(a) Gypsy Moth 

/S 

Intercept 

0,9.3) 

Longitude 

-5.1(-8.1,-2.2) 

Latitude 

-2.3(-6.6,1.2) 

County size 

-0.00007(-0.00020,0.00002) 

Quarantine date 

0.0006(-0.0044, 0.0056) 

Basal% susceptible trees 

0.0023(0.0000,0.0042) 

(b) HWA 

/S 

Intercept 

19.5(3.1,36.6) 

Longitude 

-9.8(-14.9,-4.8) 

Latitude 

8.5(1.7,16.0) 

Quarantine date 

-0.003(-0.009, 0.003) 

Ipresence of hemlock 

0.09(0.01,0.07) 

Plant hardiness zone 

0.014(-0.19,0.23) 
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Figure 1: Year of first appearance by county for the gypsy moth (a) and hemlock 
woolly adelgid (b). 
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Figure 2: (a) Patterns of spread of the gypsy moth. Arrows indicate local speeds 
and directions of spread, and are plotted where spread is signihcant. (b) Identifying 
long-range jumps for the gypsy moth invasion. Black points are potential sites 
of long-range jumps identified by the Rayleigh test. Red boxes around a point 
indicate regions of potential long-range jumps identihed by the grid search method 
(lines indicate sides of a box around a potential source where the spread was 
signihcantly out of the box). 
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Figure 3: (a) Patterns of spread of the hemlock wolly adelgid. Arrows indicate 
local speeds and directions of spread, and are plotted where spread is signihcant. 
(b) Identifying long-range jumps for the gypsy moth invasion. Black points are 
potential sites of long-range jumps identihed by the Rayleigh test. Red boxes 
around a point indicate regions of potential long-range jumps identihed by the 
grid search method (lines indicate sides of a box around a potential source where 
the spread was signihcantly out of the box). 
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Figure 4: (a) Waiting times of the stratified diffusion simulation following Shige- 


sada et al. (1995). (b) Long-range jumps for the simulated invasion. Black points 


are potential sites of long-range jumps identihed by the Rayleigh test. Red boxes 
around a point indicate regions of potential long-range jumps identihed by the grid 
search method. 
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